This is an R Markdown document demonstrating the analysis of the data from the Biorxiv paper “Universal Amplicon Sequencing of North Imperial Valley Wetlands Microbiomes”.
First we analyze how our Universal Amplicon (UA) method performs on
the Zymo
Mock Community by building a bar chart that compares our
experimental distribution of reads to the theoretical or exepected
distribution of the Zymo Mock community.
Figure 1A: Comparison of known composition for the Zymo mock community(far-right bar) compared to 28 sequencing runs using the UA primers described in this paper. Perfect quantitation with the UA method would be an exact match of a given bar to the far-right bar
Figure 1B: Box and whisker plots showing abundance of each organism within the Zymo mock community over all 28 runs compared to the theoretical or expected abundance of each organism within the mock community.
Now lets combine the two figures above
To further validate the Universal Amplicon (UA) approach
we collected data from the following sites around the North Imperial
Valley.
| Sampling Station Name | Site ID | Latitude | Longitude | Average Water Filtered (mL) | Standard Deviation (mL) |
|---|---|---|---|---|---|
| Morton Bay | IVF005 | 33.20168 | -115.5971 | 59 | 15 |
| Salton Sea Beach | IVF006 | 33.16197 | -115.6470 | 103 | 56 |
| Alamo River | IVF007 | 33.17667 | -115.5760 | 162 | 75 |
| Alamo River | IVF017 | 33.19911 | -115.5970 | 162 | 75 |
| Salton Sea Obsidian Butte | IVF008 | 33.17525 | -115.6385 | 94 | 36 |
| N MacDonald Rd | IVF010 | 33.20634 | -115.5654 | 141 | 39 |
| Managed Marsh | IVF016 | 33.20590 | -115.5271 | 168 | 77 |
| Managed Marsh | IVF022 | 33.19866 | -115.5276 | 168 | 77 |
Of those sites we selected four time points from Morton Bay to performed shotgun sequencing and uncover any potential biases with the UA in real world data.
| Sample Name | Sampling Station | Site ID | Sampling Date | Extracted gDNA Concentration (ng/ul) |
|---|---|---|---|---|
| IVF005_2018.04.30 | Morton Bay | IVF005 | 2018-04-30 | 22.4 |
| IVF005_2018.05.30 | Morton Bay | IVF005 | 2018-05-30 | 100.0 |
| IVF005_2018.06.26 | Morton Bay | IVF005 | 2018-06-26 | 19.4 |
| IVF005_2018.07.25 | Morton Bay | IVF005 | 2018-07-27 | 45.3 |
Figure 2: Comparison of shotgun sequencing versus UA characterization of
microbial community composition for Morton Bay samples from 2018-04-30,
2018-05-30, 2018-06-26, and 2018-07-25. Heatmap representation of the
fraction of sequencing reads corresponding to the given Phyla. Our UA
reads contained a significant quantity of reads that could not be
classified any more specifically than the Kingdom of Fungi, and hence
this kingdom appears adjacent to other phyla.
Figure 2v2: Comparison of shotgun sequencing versus UA
characterization of microbial community composition for Morton Bay
samples from 2018-04-30, 2018-05-30, 2018-06-26, and 2018-07-25 by
fraction of total reads visualized as a bar plot colored by Phylum
assignment
Now that we have confirmed that the fraction of total reads from each
phylum overlaps between shotgun metagenomics and the UA amplicon method
in identical samples we can expand our analysis to test the UA primers
of real samples collected from the North Imperial Valley of California.
First we will look at the relationship between all our samples using a Non-metric multidimensional scaling ordination of each sample.
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.08649805
## Run 1 stress 0.0864567
## ... New best solution
## ... Procrustes: rmse 0.02521667 max resid 0.2535512
## Run 2 stress 0.08649724
## ... Procrustes: rmse 0.02489518 max resid 0.2489992
## Run 3 stress 0.09493234
## Run 4 stress 0.09206775
## Run 5 stress 0.08658861
## ... Procrustes: rmse 0.02306773 max resid 0.258522
## Run 6 stress 0.08649739
## ... Procrustes: rmse 0.02488475 max resid 0.248925
## Run 7 stress 0.0864968
## ... Procrustes: rmse 0.02493382 max resid 0.2495259
## Run 8 stress 0.09020723
## Run 9 stress 0.08628265
## ... New best solution
## ... Procrustes: rmse 0.01137106 max resid 0.1236761
## Run 10 stress 0.08649769
## ... Procrustes: rmse 0.02278985 max resid 0.255077
## Run 11 stress 0.0872873
## Run 12 stress 0.08669207
## ... Procrustes: rmse 0.02807586 max resid 0.2626092
## Run 13 stress 0.09291213
## Run 14 stress 0.09080042
## Run 15 stress 0.09128063
## Run 16 stress 0.09321587
## Run 17 stress 0.08645556
## ... Procrustes: rmse 0.01138764 max resid 0.1236963
## Run 18 stress 0.09425672
## Run 19 stress 0.08669289
## ... Procrustes: rmse 0.02804728 max resid 0.2619809
## Run 20 stress 0.0872889
## *** No convergence -- monoMDS stopping criteria:
## 4: no. of iterations >= maxit
## 16: stress ratio > sratmax
Figure 3A: Non-metric multidimensional scaling ordination of each
sample. Samples are colored by station ID.
Figure A (Plotly 3D): Recreation of Figure 3A in 3D using NMDS axis
1, 2 and 3 as calculated via phyloseq ordinate
Next lets look at the Shannon Alpha diversity at each station as well as perform a rarefication analysis across sample sites using both observed and shannon diversity.
## Depth Sample
## Min. : 1 SGI_V4.UA_run1_2018.05.11_IVF008_2018.03.22: 300
## 1st Qu.: 10 SGI_UA_run27_2019.12.18_IVF008_20190905 : 260
## Median : 1000 SGI_UA_run5_2018.11.02_IVF006_2018.08.24 : 260
## Mean : 11336 SGI_V4.UA_run1_2018.05.11_IVF006_2018.03.22: 260
## 3rd Qu.: 20000 SGI_UA_run27_2019.12.18_IVF008_20191001 : 240
## Max. :110000 SGI_UA_run27_2019.12.18_IVF006_20190905 : 220
## (Other) :15920
## Measure Alpha_diversity
## Observed:8730 Min. : 0.000
## Shannon :8730 1st Qu.: 3.030
## Median : 5.115
## Mean : 136.275
## 3rd Qu.: 174.000
## Max. :1230.000
##
## Depth Sample Measure Alpha_diversity
## Min. : 1 Length:8730 Shannon:8730 Min. :0.000
## 1st Qu.: 10 Class :character 1st Qu.:2.164
## Median : 1000 Mode :character Median :3.947
## Mean : 11336 Mean :3.373
## 3rd Qu.: 20000 3rd Qu.:4.726
## Max. :110000 Max. :6.268
Supplemental Figure 1: Rarefication analysis across sample sites
with predicted,observed,and Shannon diversity calculated across
sequencing depth. Points are colored by unique sampling
date.
Supplemental Figure 5: Shannon diversity(calculated using
phyloseq’s plot_richness) function across all sample sites.
Now we will look at phylum and kingdom level bar charts over time for each sample location. The two IVF locations from Alamo River and Managed Marsh have been combined for visualization.
Figure 3A: Bar 264plot of ASVs over time at each station agglomerated to
the Phylum level. Vertical lines mark where specific sampling
265locations for the Alamo River and Managed Marsh changed due to
previous locations becoming inaccessible.
Supplemental Figure 3Bar plot of ASVs over time at each station
agglomerated to the Kingdom leve
We can also produce a heatmap using the 75 most abundance Amplicon Sequence Variants (ASVs) agglomerated to the class levels and sorted by sample name showing clear differentiation between sites.
## NULL
Supplemental Figure 2: Heatmap of the top 75 ASVs agglomerated to theclass levelcolumns sorted by sample name.
Zooming in on a select group of samples from March through June 2018 at Obsidian Butte shows some temporal differentiation.
Alpha diversity spikes in the May 15th samples.
Figure 4A: Seven Alpha diversity indexes shown side by side to assess evenness and richness of the community. X axis reflects 298dates of interest and y axis the calculated alpha diversity. The higher the value of y the more diverse the community is. The 299measures used include Observed diversity index, Chao1 diversity index, ACE diversity index, Shannon diversity index, Simpson 300diversity index, Inverse Simpson diversity index and Fisher diversity index.
Next we have bar charts at Obsidian Butte from March to June at the phylum level and lowest assigned.
Figure 4B: ASV abundance at Obsidian Butte from March through Juneof
2018. ASV abundances were agglomeratedto the phylum
level.
Figure 4B: ASV abundance at Obsidian Butte from March through Juneof
2018. ASV abundances were agglomeratedto the phylum
level.
Finally we zoom in on some manually curated unique facet groups to look at change over time at Obsidian Butte. These plots show Obsidian Butte Phylum level abundances from March through Juneof 2018 separated into unique groups.Groups were chosen based on known correlations between certain species in the literature and overlapping trends in abundances.
Supplemental Figure 6A: Phylum level Group 1 abundance from March
to June 2018
Supplemental Figure 6B: Phylum level Group 2 abundance from March
to June 2018
Supplemental Figure 6C: Phylum level Group 3 abundance from March
to June 2018